Data Analysis with Python Certification | freeCodeCamp.org
We're basically trying to transform data into information
This is where Python, PI Data Tools, excel at
This is where Pandas excel at
Modeling Data
adapting real life scenarios to information systems using inferential statistics to see if any pattern or model arise.
You can use Excel, CSV, XML and API inside of Jupyter Notebooks
Oftentimes, you will not be working directly with numpy.
pandas, and matplotlib. And they are all working on top of NumPy.
Python is not the right tool for computation of large datasets
Thus NumPy is a very efficient numeric processing library that sits on top of Python,
multi indexing in Numpy
The NumPy library needs to know what's the type of the object you're storing
NumPy, stores numbers date Booleans, but not a regular individual objects, as we're seeing right here
#myquestion Wtf is a regular individual objects
NumPy says we can create multi dimensional arrays
NumPy has a ton of attributes and functions to work with multi dimensional arrays.
Selecting Elements in Dimensional Arrays
a = np.array([
#0 #1 #2
[1, 2, 3] # 0
[4, 5, 6] # 1
[7, 8, 9] # 2
])
[row][element]
Numpy Concepts
vectorized operations and broadcasting
vectorized operations are operations performed between both arrays and arrays, and arrays and scalars,
Basically, allows you to perform operations on entire arrays
So instead of using for loop to scan through an array to add each numbers
vectorized operations is used as a more efficient way to add each numbers in an array
NumPy is an immutable first library, it will not any operation, you performing an array will not modify it, but it will return a new array. \
Matplotlib has a global API and an object-oriented API for creating visualizations.
- Global API
- simpler for quick and straightforward plotting tasks
- often manipulate the current figure and axes directly, and the state is maintained globally, which can lead to confusion when dealing with multiple plots or figures.
- Object-oriented API
- more explicit and allows precise control over plots.
Matplotlib supports various plot types:
Statistical analysis helps determine whether a value is valid or an outlier, depending on the context.
You can also read data from Databases (SQL, Postgres, etc...)
So basically, you can import Excel, CSV, Databases, HTML into a panda's Dataframe
.execute()
method that allows you to execute SQL queries against the database
fundamental step in interacting with databases programmatically.